Comparing and Combining Generative and Posterior Probability Models: Some Advances in Sentence Boundary Detection in Speech
نویسندگان
چکیده
We compare and contrast two different models for detecting sentence-like units in continuous speech, using both acoustic and lexical information. The first approach is based on hidden Markov sequence models based on N-grams, uses maximum likelihood estimation, and model interpolation to combine different representations of the data. The second approach models the posterior probabilities of the target classes, is therefore discriminative, and integrates multiple knowledge sources in the maximum entropy (maxent) framework. Both models combine lexical, syntactic, and prosodic information. We develop a technique for integrating pretrained probability models into the maxent framework, and show that this approach can improve, if only slightly, on an HMM-based state-of-the-art system for the sentence-boundary detection task. A much more substantial improvement is obtained by combining the posterior probabilities of the two systems.
منابع مشابه
Comparing and Combining Modeling Techniques for Sentence Segmentation of Spoken Czech Using Textual and Prosodic Information
This paper deals with automatic sentence boundary detection in spoken Czech using both textual and prosodic information. This task is important to make automatic speech recognition (ASR) output more readable and easier for downstream language processing modules. We compare and combine three statistical models – hidden Markov model, maximum entropy, and adaptive boosting. We evaluate these metho...
متن کاملImproving Automatic Sentence Boundary Detection with Confusion Networks
We extend existing methods for automatic sentence boundary detection by leveraging multiple recognizer hypotheses in order to provide robustness to speech recognition errors. For each hypothesized word sequence, an HMM is used to estimate the posterior probability of a sentence boundary at each word boundary. The hypotheses are combined using confusion networks to determine the overall most lik...
متن کاملVoice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملA deep neural network approach for sentence boundary detection in broadcast news
This paper presents a deep neural network (DNN) approach to sentence boundary detection in broadcast news. We extract prosodic and lexical features at each inter-word position in the transcripts and learn a sequential classifier to label these positions as either boundary or non-boundary. This work is realized by a hybrid DNN-CRF (conditional random field) architecture. The DNN accepts prosodic...
متن کاملSentence Boundary Detection in Broadcast Speech Transcripts
This paper presents an approach to identifying sentence boundaries in broadcast speech transcripts. We describe finite state models that extract sentence boundary information statistically from text and audio sources. An n-gram language model is constructed from a collection of British English news broadcasts and scripts. An alternative model is estimated from pause duration information in spee...
متن کامل